A Clinical Prognostic Model Based on Machine Learning from the Fondazione Italiana Linfomi (FIL) MCL0208 Phase III Trial

Simple Summary The interest in using Machine-Learning (ML) techniques in clinical research is growing. We applied ML to build up a novel prognostic model from patients affected with Mantle Cell Lymphoma (MCL) enrolled in a phase III open-labeled, randomized clinical trial from the Fondazione Italiana Linfomi (FIL)—MCL0208. This is the first application of ML in a prospective clinical trial on MCL lymphoma. We applied a novel ML pipeline to a large cohort of patients for which several clinical variables have been collected at baseline, and assessed their prognostic value based on overall survival. We validated it on two independent data series provided by European MCL Network. Due to its flexibility, we believe that ML would be of tremendous help in the development of a novel MCL prognostic score aimed at re-defining risk stratification. Abstract Background: Multicenter clinical trials are producing growing amounts of clinical data. Machine Learning (ML) might facilitate the discovery of novel tools for prognostication and disease-stratification. Taking advantage of a systematic collection of multiple variables, we developed a model derived from data collected on 300 patients with mantle cell lymphoma (MCL) from the Fondazione Italiana Linfomi-MCL0208 phase III trial (NCT02354313). Methods: We developed a score with a clustering algorithm applied to clinical variables. The candidate score was correlated to overall survival (OS) and validated in two independent data series from the European MCL Network (NCT00209222, NCT00209209); Results: Three groups of patients were significantly discriminated: Low, Intermediate (Int), and High risk (High). Seven discriminants were identified by a feature reduction approach: albumin, Ki-67, lactate dehydrogenase, lymphocytes, platelets, bone marrow infiltration, and B-symptoms. Accordingly, patients in the Int and High groups had shorter OS rates than those in the Low and Int groups, respectively (Int→Low, HR: 3.1, 95% CI: 1.0–9.6; High→Int, HR: 2.3, 95% CI: 1.5–4.7). Based on the 7 markers, we defined the engineered MCL international prognostic index (eMIPI), which was validated and confirmed in two independent cohorts; Conclusions: We developed and validated a ML-based prognostic model for MCL. Even when currently limited to baseline predictors, our approach has high scalability potential.


Introduction
Currently, prospective multicenter clinical trials are accumulating unprecedented amounts of information. The potential of these data is underexploited, in terms of increasing our understanding of the diseases and our ability to discriminate their outcomes [1].
Although in its infancy, the application of machine-learning (ML) tools in oncology and hematology is currently on the rise [2,3]. In acute myeloid leukemia, ML has been applied to drug discovery programs and gene expression profiling, leading to the discovery of novel predictive biomarkers [4][5][6]. Moreover, ML can be applied to the development of prediction models of treatment-response optimal timing [7,8], hematopoietic stem cell transplantation outcomes [9][10][11][12], and survival outcomes [13][14][15][16]. For example, Biccler et al. exploited registry data to develop several prognostic models for diffuse large Bcell lymphoma (DLBCL) [13]. Their ML approach identified clinical prognostic factors that performed better than the International Prognostic Index (IPI), in training-set and validation-set, respectively.
Mantle cell lymphoma (MCL) is a highly heterogeneous disease. Some subtypes are aggressive and chemo-refractory; however, other subtypes have shown prolonged survival after tailored treatment [17][18][19][20]. Currently, a number of prognostic models are available that are generally related to the MCL international prognostic index (MIPI) [21][22][23][24][25]. The standard MIPI (MIPI-st) was developed by Hoster et al., and it has been refined and adapted over time.
Taking advantage of our experience with the MCL0208 clinical trial for young patients with MCL [26] (NCT02354313, sponsored by the Fondazione Italiana Linfomi [FIL]), we systematically collected and organized hundreds of clinical and biological variables in a previously generated data warehouse (DW) [1,27,28], which allowing careful quality assessments and substantial improvements in the accuracy of the results [29].
In the present study, we applied a hierarchical clustering algorithm to a large number of clinical variables from the DW, collected at baseline. We assessed their prognostic value on overall survival (OS) and, following the clustering analysis, we modeled a novel prognostic score, which we defined as the engineered MIPI (eMIPI). This was finally validated in two independent data series from the European MCL Network (NCT00209222, NCT00209209).

Patients
Data were collected from a phase III, multicenter, open-label, randomized, controlled clinical trial, primarily aimed at determining the efficacy and safety of Lenalidomide as a 2 years maintenance therapy after autologous stem cell transplantation (ASCT). The trial enrolled 303 younger (≤65 years) patients with MCL, all of which received high-dose immune-chemotherapy, followed by ASCT [26]. The study was conducted in accordance with the Declaration of Helsinki, and all patients provided written informed consent for the collection and research use of clinical and biological data.

Data Preparation
Data preparation is described in the Supplementary Methods and Figure S1. We retrieved 34 available clinical features at baseline from electronic case report forms and laboratory data sources. These features included clinical (e.g., Eastern Cooperative Oncology Group parameters), laboratory (e.g., lactate dehydrogenase [LDH] below or above the upper limit of normal [ULN]), pathology (e.g., Ki-67 proliferation index), and demographic (age at diagnosis) variables.
Among these 34 features, 8 were not eligible for analysis, due to the high number of missing values, and were thus excluded. Among the remaining others, 17 were continuous and 9 were binary: the continuous variables were dichotomized according to established cut-offs to allow comparisons:

•
Regarding the Age at diagnosis and the lymphoma involvement by flow-cytometry on peripheral blood (flowPB) variables, an optimal cut-off was respectively determined by applying a spline function fitted via logistic regression model, assuming the PFS at June 2019 data cut-off as a dependent variable.
Only patients without missing values were included in the training-set.

Clustering Analysis and Features Reduction
Clustering analysis was performed on complete data to discriminate different groups of patients, based on their baseline features ( Figure S2). We applied a hierarchical algorithm setting the "Ward" linkage and the "Euclidean" distance. The cluster analysis was implemented via Matlab R2019 (version 9.8.0.1359463 (2020a), Natick MA, USA, Bioinformatics Toolbox.
The acquired groups of patients were then correlated with clinical outcomes, and the best model was assessed with a metric to allow comparison between survival models, including concordance (C)-index [32], -2*log-likelihood (-2LL), Akaike (AIC), and Bayesian (BIC) Information Criteria calculations. The best model was then chosen for further analytical steps.
To select a clinically applicable set of variables, we firstly applied a statistical bivariate feature reduction (as detailed in the Supplementary Methods). For the ultimate feature selection, we applied a Recursive Feature Extraction algorithm (RFE, Figure S2F) with , Vienna, Austria, https://www.r-project.org). A resampling method was applied as cross-validation. The training set was randomly divided into 10 parts and then each part was used as testing dataset for a Random Forest model trained on the other 9 (10-fold cross-validation). The accuracy given by each model was assessed by calculating the average of 5 error terms obtained by performing 10 folds five times. Based on the most accurate model, we selected the number of the most influencing features, and of these bases we defined the eMIPI score ( Figures S2G and S5).

Survival Analysis
Survival analyses were performed with the training-set, according to eMIPI classes, with both multivariate Cox and Kaplan-Meier (K-M) methods (survival data cut-off: June 2019). Then, the eMIPI classifications were compared to previously recognized prognostic models: the MIPI-st, according to Hoster et al. [21], the MIPI-biological (b) [21], and the MIPI-c [22] ( Figure S2H). The models were compared by assessing C-index, -2LL, AIC, and BIC. The outcome analysis, Cox modeling, and performance of each model were implemented with the "Survival" (V. 2.44-1.1), and "stats" (V 3.6.2.) packages provided with R. To validate our methods, we used the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis OR Diagnosis (TRIPOD) criteria.

Extrapolation of a Simplified eMIPI Score for External Validation
Reproducible formulas were implemented to assign patients to eMIPI prognostic groups ( Figure S2I), according to each patient's profile. The total set of patient profiles was thus extracted from a heat-map, where the classification was assigned according to the outcome.
Next, we externally validated the eMIPI on a trial cohort of "Younger" patients from the European MCL Network, that was comparable to the FIL-MCL0208 discovery cohort ( Figure S2J). We also validated the eMIPI on a trial cohort of "Elderly" patients from the European MCL Network and we explored the prognostic value of the eMIPI for clinical outcomes by comparing it to the prognostic values of the MIPI-st, MIPI-b, and MIPI-c. Validation methods are detailed in the Supplementary methods.

Patient Characteristics
Demographic and clinical characteristics from the 300 patients eligible are summarized in Table 1 [26].
Overall, 185 patients were considered for the training-set. For OS, the median followup was 4.7 years, with an interquartile range (IQR) of 4.3-5.2 years. For progression-free survival (PFS), the median follow-up was 4.8 years (IQR: 4.3-5.3), and the five-year PFS was 52%. OS probability of patients included vs. excluded (N = 115) from the training-set were superimposable, as shown in Figure S3.
According to the MIPI-st, we classified 110 (59%) patients as Low risk, 53 (27%) patients as intermediate risk (Int), and 22 (12%) patients as High risk. According to the MIPI-b, we classified 49 (26%) patients as Low risk, 87 (47%) patients as Int risk, and 49 (26%) patients as High risk. Finally, according to the MIPI-c, we classified 91 (49%) patients as Low risk, 49 (26%) patients as Int-Low risk, 28 (15%) patients as Int-High risk, and 17 (10%) patients as High risk. Figure 1 shows the heat-map that was constructed based on the clustering analysis of the training-set. The horizontal dendrogram is the result of patients clustering, while the vertical dendrogram outlines the clustering of patient characteristics. This analysis allowed us to define three clusters (C) of patients, designated as: C1 (n = 92, 50%), C2 (n = 45, 24%), and C3 (n = 48, 26%). A correlation analysis between each group and the clinical outcomes

Comparison between the Simplified and Starting Models
We compared the starting model, which included all 26 features (Figure 1

Survival Analysis
With the simplified model, we prepared K-M survival curves with patients stratified according to the C1, C2, and C3 patient groups. This analysis showed that the three groups had significantly different risk of OS. Hence, these risk groups were renamed in terms of the eMIPI, as Low, Int, and High, respectively. Figure 3A shows the K-M curves of OS for the three eMIPI groups. The cumulative survival probabilities at 5 y were 0.94, 0.83, and 0.58, for the Low, Int, and High eMIPI groups, respectively ( Figure 3B). We observed that patients High eMIPI values had a significantly lower OS than those with Int (HR: 2.32, 95% CI: 1.14-4.73, p = 0.025) and Low eMIPI values (HR: 7.09, 95% CI: 2.46-20.48, p < 0.001).

Patient Profiles According to eMIPI
To create a simple prognostic tool for validation on an external cohort series, we analyzed each patient profile obtained from the cluster analysis (a total of fifty-five possible profiles), representing every eMIPI class (Table S2). The simplification rules derived from these profiles are shown in Table 2.
Most patient profiles could be readily assigned to the three main groups with Low, High, and Int. In some cases, those patient profiles that could not be assigned to either the Low risk or the High risk groups were assigned to the Int risk group (Table 2, formula 8).
Briefly, patients with abnormal albumin were always classified as High risk, according to the heatmap. Additionally, some patients with normal albumin were characterized as High risk on the basis of abnormal values for the other remaining features (Table 2,  formulas 3-7).
Notably, we individually tested each simplified formula by comparing the resulting eMIPI class of risk with clinical outcomes to verify the correctness of each formula. A K-M survival analysis confirmed that the formulas provided consistent classifications, as expected from the Figure 3A. Intermediate 53 (29) 20 (17) High 22 (12) 25 (22) MV 0 -

eMIPI Comparison with Recnognized Scores
We compared the eMIPI classification with three currently recognized indexes for predicting the OS: the MIPI-st, the MIPI-b, and the MIPI-c. All indexes were tested on the same subset of patients.

Feature Reduction
The final model selection fulfilled the clinical requirement for obtaining a signature of a few clinical variables that were easily derived from patient characteristics (Supple- Taken together, the eMIPI produces the most balanced groups of patients (Low risk: 31%, Int risk: 30%, and High risk: 39%), when compared to the distributions produced with the MIPI-st (Low: 59%, Int: 29%, and High: 17%) and the MIPI-b (Low: 26%, Int: 62%, and High: 10%).

External Validation
We next sought to validate the eMIPI approach by applying it to the external patient series from the "Younger" and "Elderly" trials of the European MCL Network [18,33]. For the "Younger" cohort, 254 out of 613 patients were selected for the comparative analysis. Of note, the excluded patients did not display any significant difference in terms of median survival (10.0 vs. 11.0 years) ( Figure S6). In contrast, a significant difference in terms of median survival (9.1 and 6.9 years) was observed when comparing selected vs excluded patients when pooling together the "Younger" and "Elderly" series ( Figure S7). Again, no difference was observed when comparing the excluded patients from both the "Younger" and "Elderly" series (59% vs. 60%).

Comparison between the Simplified and Starting Models
We compared the starting model, which included all 26 features (Figure 1), to the simplified model composed of only seven features (Figure 2). The latter model slightly outperformed the starting model, including the whole set of variables in predicting OS

Patient Profiles according to eMIPI
To create a simple prognostic tool for validation on an external cohort series, we a alyzed each patient profile obtained from the cluster analysis (a total of fifty-five possib profiles), representing every eMIPI class (Table S2). The simplification rules derived fro these profiles are shown in Table 2. Table 2. Manual reduction of rules to obtain the smallest set that could correctly classify all t patients.

Risk
Formula Criteria  When surveying the prognostic value in the "Elderly" cohort ( Figure 5B), the eMIPIdiscriminated groups were composed of 57 (eMIPI Low, 22%), 77 (eMIPI Int, 29%), and 129 (eMIPI High, 49%) patients. Similarly, also in this cohort eMIPI High patients significantly displayed OS that the patients from both Int (HR: 1.90, 95% CI: 1.17-3. When pooling together the "Younger" and the "Elderly" series ( Figure 5C), we observed patients with eMIPI High having a lower OS compared to eMIPI Int (HR: 1.80, 95% CI: 1.12-2.80) and Low eMIPI ones (HR: 2.20, 95% CI: 0.92-5.50). Consequently, the eMIPI retained its prognostic value in reference to the recognized scores also in this series (data not shown).    When pooling together the "Younger" and the "Elderly" series (Figur served patients with eMIPI High having a lower OS compared to eMIPI Int ( CI: 1.12-2.80) and Low eMIPI ones (HR: 2.20, 95% CI: 0.92-5.50). Consequen retained its prognostic value in reference to the recognized scores also in th not shown).

Discussion
In this study we developed a ML-based prognostic model to create a new MCL risk score, named eMIPI. The ML modeling approach included (i) clustering analysis using classical dendrograms and (ii) features reduction using a Random Forrest algorithm applied to a training cohort encompassing 300 patients (FIL-MCL0208). Finally, the robustness of our prognostic model was further validated using data from two large independent trials [18,33].
The application of ML approaches in the hematology field is rapidly growing, although most ML studies are retrospective [7,11,13,14,[34][35][36], based on data retrieved from electronic health records at either single centers or multiple centers. For example, Agius et al. developed a ML pipeline based on data for 4149 patients retrieved from the Danish Chronic Lymphocytic Leukemia (CLL) registry. Those data allowed the construction of a very accurate treatment-infection model of CLL [35].
Clinical trials rarely allow researchers to collect the number of patients typically analyzed in retrospective series. However, trials often contain larger sets of variables and offer superior data quality, compared to those available for retrospective series. These observations were particularly evident in the FIL-MCL0208 trial, which underwent rigorous refinement, accurate feature assessments, and uniform evaluations of clinical outcomes through the DW-based data handling method [1]. Therefore, although the model proposed here did not take into account the full panel of data available from the eCRFs, it should be considered a first step in implementing reliable ML algorithms [37] in the context of a clinical trial.
Starting with thirteen baseline variables retrieved from a national registry, Biccler et al. showed that ML was useful in finding the most predictive model of risk among twelve supervised models for newly diagnosed DLBCL patients treated with rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisolone (R-CHOP) or R-CHOP like therapy [8]. In our analysis, we started with thirty-four variables (Supplementary Figure S2A) as input for an unsupervised algorithm. Thus, data variability, when correctly handled, can allow the development of novel prognostic scores. Indeed, Kurtz et al. showed that a model that combined clinical data (IPI index), interim imaging risk factors, and circulating tumor DNA risk factors, outperformed each factor taken individually for predicting event-free survival among patients with DLBCL [38].
Overall, in this analysis a proportion of patients (38%) was excluded from the training set, due to the high number of missing values ( Figure S1). This step was needed for clustering analysis, which runs only with complete data. Nonetheless, no selection bias was introduced in the analysis as the clinical outcome of included vs excluded patients was superimposable ( Figure S3). On the other hand, we applied an unsupervised methodology which ensembled together several variables from different sources. At allowing a comparison with binarized variables, each continuous variable was iteratively dichotomized according to either recognized ranges or clinical outcomes (e.g., age at diagnosis and flowPB variables).
The FIL-MCL0208 DW contained a large number of variables. We chose to limit this first modeling effort to a subset of only 26 easily accessible variables for two reasons: (1) we needed to validate the model with an independent series that did not include all the biological features measured in the training-set; and (2) prognostic scores based on clinical variables easily accessible can provide greater opportunities, due to their broad applicability. However, we believe that models with more complex datasets will be feasible soon. Those studies will increase our knowledge of MCL biology and allow clinicians to choose the most robust biological predictors tailored to each case.
Differently from the recognized prognostic scores for MCL, the eMIPI included albumin levels (that might reflect the inflammatory status and the hepatic synthesis at diagnosis), B symptoms (included in the basic diagnostic workup for MCL), and BM tumor infiltration and altered PLTs levels (both possibly related to high tumor burden). Interestingly, abnormal levels of albumin are enough for conferring the patient to High risk profile.
Moreover, in both training and validation series, the eMIPI allocated a larger proportion of patients as High risk than recognized scores for patients of comparable age. This finding was critical, considering that MCL is still a frequently relapsing disease, and future trials that aim to test personalized treatment intensifications will benefit from prognosticators that can identify a considerable proportion of patients at High risk. To broadly promote the clinical usefulness of the eMIPI tool we implemented an easy-to-use calculator on the FIL website (http://filinf.it/eMIPI, accessed on 29 October 2021).
A partial drawback of this study is that the eMIPI did not outperform MIPI-st and MIPI-b when pooling together "Younger" and "Elderly" patients from European MCL Network. However, although the eMIPI was based on a cohort of young patients with MCL, it retained its prognostic value in a large trial of older patients. Thus, our results indicate that the variables chosen in our model are likely to retain good predictivity, regardless of the potential confounding roles of age-and frailty-associated parameters.

Conclusions
This study provided a proof-of-principle that ML can be a useful tool in prognostication modeling associated with clinical trials in lymphoma. We are aware that the eMIPI might potentially be integrated with biological and time-dependent variables in the future.
To fully exploit the potential of ML-based modeling, data might be pooled from several clinical trials with similar characteristics, and additional variables could be included. Application of the same principles to other disease entities might also be feasible.
Supplementary Materials: The following are available online at http://www.mdpi.com/xxx/s1. Figure S1: Pipeline for data pre-processing, Figure S2: Flow diagram for preparation and validation of e-MIPI score, Figure S3: OS probability of patients included vs. patients excluded from training-set, Figure S4: Multicollinear analysis according to Spearman, Figure S5: Recursive feature extraction, Figure S6: Validation Series: MCL Younger, Figure S7: Validation Series: MCL Younger and Elderly, Table S1: Bivariate analysis, Table S2: Patients' profiles, Table S3: Patients' characteristics from the external validation series; Table S4: Power estimation in the validation cohort: validation series according to each cohort, Table S5: Descriptive statistics in the validation cohort: MCL Younger series, Table S6: Descriptive statistics in the validation cohort: MCL Younger and Elderly series, Supplementary Methods: Data Preparation. Data pre-processing: clustering analysis and feature reduction. Validation, Supplementary Results: Feature reduction. Validation.